2 research outputs found
A Comparative Performance Analysis of Explainable Machine Learning Models With And Without RFECV Feature Selection Technique Towards Ransomware Classification
Ransomware has emerged as one of the major global threats in recent days. The
alarming increasing rate of ransomware attacks and new ransomware variants
intrigue the researchers in this domain to constantly examine the
distinguishing traits of ransomware and refine their detection or
classification strategies. Among the broad range of different behavioral
characteristics, the trait of Application Programming Interface (API) calls and
network behaviors have been widely utilized as differentiating factors for
ransomware detection, or classification. Although many of the prior approaches
have shown promising results in detecting and classifying ransomware families
utilizing these features without applying any feature selection techniques,
feature selection, however, is one of the potential steps toward an efficient
detection or classification Machine Learning model because it reduces the
probability of overfitting by removing redundant data, improves the model's
accuracy by eliminating irrelevant features, and therefore reduces training
time. There have been a good number of feature selection techniques to date
that are being used in different security scenarios to optimize the performance
of the Machine Learning models. Hence, the aim of this study is to present the
comparative performance analysis of widely utilized Supervised Machine Learning
models with and without RFECV feature selection technique towards ransomware
classification utilizing the API call and network traffic features. Thereby,
this study provides insight into the efficiency of the RFECV feature selection
technique in the case of ransomware classification which can be used by peers
as a reference for future work in choosing the feature selection technique in
this domain.Comment: arXiv admin note: text overlap with arXiv:2210.1123
Is iterative feature selection technique efficient enough? A comparative performance analysis of RFECV feature selection technique in ransomware classification using SHAP
Abstract The realm of cybersecurity places significant importance on early ransomware detection. Feature selection is critical in this context, as it enhances detection accuracy, mitigates overfitting, and reduces training time by eliminating irrelevant and redundant data. However, iterative feature selection techniques tend to select the best-performing subset of features through an iterative process which leaves chance for a crucial feature not being selected and the number of selected features may not always be the optimal or the most suitable for a given problem. Hence, this study aims to conduct a performance comparison analysis of an iterative feature selection technique- Recursive Feature Elimination with Cross-Validation (RFECV) with six supervised Machine Learning (ML) models to evaluate its efficiency in classifying ransomware utilizing the Application Programming Interface (API) call and network traffic features. The study employs an Explainable Artificial Intelligence (XAI) framework called SHapley Additive exPlanations (SHAP) to derive the crucial features when RFECV is not integrated with the ML models. These features are then compared with RFECV-selected features when it is integrated. Results show that without RFECV the ML models achieve better classification accuracies on two datasets. Again, RFECV falls short of selecting impactful features, leading to more false alarms. Moreover, it lacks the capability to rank the features based on their importance, reducing its efficiency in ransomware classification overall. Thus, this study underscores the importance of integrating explainability techniques to identify critical features, rather than solely relying on iterative feature selection methods, to enhance the resilience of ransomware detection systems